-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_excel trailing blank rows and columns #41227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ahawryluk
commented
Apr 29, 2021
- closes BUG: some read_excel engines still load trailing blank cells #41167
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
Not an expert here, but could you run asvs for read_excel? |
@phofl here are the asvs on my machine (asv run -E existing --bench ReadExcel)
The asv test data has no trailing cells, so we don't see a measurable impact. I also tested both branches on a sample .xlsx file with 1000 rows × 2 columns and a single formatted cell on row 0, column 2**14. master 2.84 s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment.
cc @rhshadrach ok here?
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -799,6 +799,7 @@ I/O | |||
- Bug in :meth:`DataFrame.to_string` misplacing the truncation column when ``index=False`` (:issue:`40907`) | |||
- Bug in :func:`read_orc` always raising ``AttributeError`` (:issue:`40918`) | |||
- Bug in the conversion from pyarrow to pandas (e.g. for reading Parquet) with nullable dtypes and a pyarrow array whose data buffer size is not a multiple of dtype size (:issue:`40896`) | |||
- Bug in :func:`read_excel` loading trailing empty rows/columns for some filetypes (:issue:`41167`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put netx to this one (or combine as i think they are basically the same)
Bug in :func:
read_excel
dropping empty values from single-column spreadsheets (:issue:39808
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the whatsnew line, but left the two items separate since one bug dropped NaNs within the data and the other bug loaded extra NaNs outside the data. Thanks for reviewing this.
thanks @ahawryluk |
lgtm thanks @ahawryluk |